Wayne County and Grant County
SARIMAX-Based Power Outage Prediction During Extreme Weather Events
Ye, Haoran, Sun, Qiuzhuang, Yang, Yang
This study develops a SARIMAX-based prediction system for short-term power outage forecasting during extreme weather events. Using hourly data from Michigan counties with outage counts and comprehensive weather features, we implement a systematic two-stage feature engineering pipeline: data cleaning to remove zero-variance and unknown features, followed by correlation-based filtering to eliminate highly correlated predictors. The selected features are augmented with temporal embeddings, multi-scale lag features, and weather variables with their corresponding lags as exogenous inputs to the SARIMAX model. To address data irregularity and numerical instability, we apply standardization and implement a hierarchical fitting strategy with sequential optimization methods, automatic downgrading to ARIMA when convergence fails, and historical mean-based fallback predictions as a final safeguard. The model is optimized separately for short-term (24 hours) and medium-term (48 hours) forecast horizons using RMSE as the evaluation metric. Our approach achieves an RMSE of 177.2, representing an 8.4\% improvement over the baseline method (RMSE = 193.4), thereby validating the effectiveness of our feature engineering and robust optimization strategy for extreme weather-related outage prediction.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.25)
- Asia > Singapore (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (4 more...)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Lightweight Knowledge Representations for Automating Data Analysis
Sterbentz, Marko, Barrie, Cameron, Hooshmand, Donna, Shahi, Shubham, Dutta, Abhratanu, Pack, Harper, Zhao, Andong Li, Paley, Andrew, Einarsson, Alexander, Hammond, Kristian
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.
- North America > United States > Illinois > DuPage County (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Washington (0.04)
- (3 more...)
- Research Report (0.50)
- Workflow (0.48)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Education (1.00)
- (2 more...)
Clustering US Counties to Find Patterns Related to the COVID-19 Pandemic
Brown, Cora, Milstein, Sarah, Sun, Tianyi, Zhao, Cooper
When COVID-19 first started spreading and quarantine was implemented, the Society for Industrial and Applied Mathematics (SIAM) Student Chapter at the University of Minnesota-Twin Cities began a collaboration with Ecolab to use our skills as data scientists and mathematicians to extract useful insights from relevant data relating to the pandemic. This collaboration consisted of multiple groups working on different projects. In this write-up we focus on using clustering techniques to help us find groups of similar counties in the US and use that to help us understand the pandemic. Our team for this project consisted of University of Minnesota students Cora Brown, Sarah Milstein, Tianyi Sun, and Cooper Zhao, with help from Ecolab Data Scientist Jimmy Broomfield and University of Minnesota student Skye Ke. In the sections below we describe all of the work done for this project. In Section 2, we list the data we gathered, as well as the feature engineering we performed. In Section 3, we describe the metrics we used for evaluating our models. In Section 4, we explain the methods we used for interpreting the results of our various clustering approaches. In Section 5, we describe the different clustering methods we implemented. In Section 6, we present the results of our clustering techniques and provide relevant interpretation. Finally, in Section 7, we provide some concluding remarks comparing the different clustering methods.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Michigan > Wayne County > Wayne (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- (26 more...)
- Health & Medicine > Epidemiology (0.86)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.63)
- Health & Medicine > Therapeutic Area > Immunology (0.63)